717 research outputs found

    TGSum: Build Tweet Guided Multi-Document Summarization Dataset

    Full text link
    The development of summarization research has been significantly hampered by the costly acquisition of reference summaries. This paper proposes an effective way to automatically collect large scales of news-related multi-document summaries with reference to social media's reactions. We utilize two types of social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to cluster documents into different topic sets. Also, a tweet with a hyper-link often highlights certain key points of the corresponding document. We synthesize a linked document cluster to form a reference summary which can cover most key points. To this aim, we adopt the ROUGE metrics to measure the coverage ratio, and develop an Integer Linear Programming solution to discover the sentence set reaching the upper bound of ROUGE. Since we allow summary sentences to be selected from both documents and high-quality tweets, the generated reference summaries could be abstractive. Both informativeness and readability of the collected summaries are verified by manual judgment. In addition, we train a Support Vector Regression summarizer on DUC generic multi-document summarization benchmarks. With the collected data as extra training resource, the performance of the summarizer improves a lot on all the test sets. We release this dataset for further research.Comment: 7 pages, 1 figure in AAAI 201

    Inhomogeneous states with checkerboard order in the t-J Model

    Full text link
    We study inhomogeneous states in the t-J model using an unrestricted Gutzwiller approximation. We find that pa×papa\times pa checkerboard order, where pp is a doping dependent number, emerges from Fermi surface instabilities of both the staggered flux phase and the Fermi liquid state with realistic band parameters. In both cases, the checkerboard order develops at wave vectors (±2π/pa,0)(\pm 2\pi/pa,0), (0,±2π/pa)(0,\pm2\pi/pa) that are tied to the peaks of the wave-vector dependent susceptibility, and is of the Lomer-Rice-Scott type. The properties of such periodic, inhomogeneous states are discussed in connection to the checkerboard patterns observed by STM in underdoped cuprates.Comment: Published Versio

    Multi-Document Summarization via Discriminative Summary Reranking

    Full text link
    Existing multi-document summarization systems usually rely on a specific summarization model (i.e., a summarization method with a specific parameter setting) to extract summaries for different document sets with different topics. However, according to our quantitative analysis, none of the existing summarization models can always produce high-quality summaries for different document sets, and even a summarization model with good overall performance may produce low-quality summaries for some document sets. On the contrary, a baseline summarization model may produce high-quality summaries for some document sets. Based on the above observations, we treat the summaries produced by different summarization models as candidate summaries, and then explore discriminative reranking techniques to identify high-quality summaries from the candidates for difference document sets. We propose to extract a set of candidate summaries for each document set based on an ILP framework, and then leverage Ranking SVM for summary reranking. Various useful features have been developed for the reranking process, including word-level features, sentence-level features and summary-level features. Evaluation results on the benchmark DUC datasets validate the efficacy and robustness of our proposed approach
    corecore